In Section @ref(cell-quality) we observed staining/expression differences between the individual samples. This can arise due to technical (e.g., differences in sample processing) as well as biological (e.g. differential expression between patients/indications) reasons. However, the combination of these effects hinders cell phenotyping via clustering as highlighted in Section @ref(clustering).
To integrate cells across samples, we can use computational strategies developed for correcting batch effects in single-cell RNA sequencing data. In the following sections, we will use functions of the batchelor, harmony and Seurat packages to correct for such batch effects.
We will only test harmony, as the other two took several days!!
Of note: the correction approaches presented here aim at removing any differences between samples. This will also remove biological differences between the patients/indications. Nevertheless, integrating cells across samples can facilitate the detection of cell phenotypes via clustering.
First, we will read in the SpatialExperiment object
containing the single-cell data.
When a code chunk is time-consuming to run, you may consider caching
it via the chunk option cache = TRUE. When the cache is turned on,
knitr will skip the execution of this code chunk if it has
been executed before and nothing in the code chunk has changed since
then. This is explained in more detail here.
However cache=TRUE did not work and was computing batch correction
anyway. Thus the chunk was commented.
path <- "/mnt/Multimodal_Imaging_Daria/_Data_Analysis/_tmp_daria/Image_analysis/20220723_Steinbock_Mesmer_IFbased"
spe <- readRDS(file.path(path,"data/spe.rds"))
The harmony algorithm performs batch correction by
iteratively clustering and correcting the positions of cells in PCA
space [@Korsunsky2019]. It requires a
matrix of transformed expression counts and internally performs PCA
before kmeans clustering. We will first create the expression matrix and
call the HarmonyMatrix function to perform the
correction.
Paper: Fast, sensitive
and accurate integration of single-cell data with Harmony
Documentation: harmony
Similar to the fastMNN function, harmony
returns the corrected low-dimensional coordinates for each cell. These
can be saved in the reducedDim slot of the original
SpatialExperiment object.
#library(harmony)
#mat <- t(assay(spe, "exprs")[rowData(spe)$use_channel,])
#harmony_emb <- HarmonyMatrix(mat, spe$sample, do_pca = TRUE)
#reducedDim(spe, "harmony") <- harmony_emb
The computational time of the HarmonyMatrix function
call is 0 minutes.
We will now again visualize the cells in low dimensions after UMAP embedding.
#set.seed(220228)
#detectCores(all.tests = FALSE, logical = TRUE) # find out how many logical cores available
#spe <- runUMAP(spe, dimred = "harmony", name = "UMAP_harmony", BPPARAM = MulticoreParam())
# visualize sample
library(cowplot)
library(dittoSeq)
library(viridis)
library(RColorBrewer)
n <- length(metadata(spe)$color_vectors$sample)
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
metadata(spe)$color_vectors$sample <- sample(col_vector, n)
p1 <- dittoDimPlot(spe, var = "sample",
reduction.use = "UMAP", size = 0.2, legend.size=2) +
scale_color_manual(values = metadata(spe)$color_vectors$sample) +
ggtitle("Sample ID on UMAP before correction") +
theme(legend.text=element_text(size=2))
p2 <- dittoDimPlot(spe, var = "sample",
reduction.use = "UMAP_harmony", size = 0.2, legend.size=2) +
scale_color_manual(values = metadata(spe)$color_vectors$sample) +
ggtitle("Sample ID on UMAP after correction") +
theme(legend.text=element_text(size=2))
plot_grid(p1)
plot_grid(p2)
And we visualize selected marker expression as defined above.
# Before correction
plot_list <- multi_dittoDimPlot(spe, var = rownames(spe)[rowData(spe)$use_channel], reduction.use = "UMAP",
assay = "exprs", size = 0.2, list.out = TRUE)
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list, nrow=13, ncol=3)
# After correction
plot_list <- multi_dittoDimPlot(spe, var = rownames(spe)[rowData(spe)$use_channel], reduction.use = "UMAP_harmony",
assay = "exprs", size = 0.2, ncol=3, nrow=13, list.out = TRUE)
plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
plot_grid(plotlist = plot_list, nrow=13, ncol=3)
We observe a more aggressive merging of cells from different samples
compared to the results after fastMNN correction. What we
can see in both, corrected and uncorrected, is that T-cells appear in
the similar region with expression of CD3, CD8, CD4, CD279 and Granzyme
B. CD45 and CD15 are everywhere even in the tumor regions (GD2+, CD56+,
GATA3+, ELAVL4+), which shouldn’t be the case. HLA-DR, CD14 and CD11c
and CD274 come up in similar regions. CD24, CD10 regions could be
immature neutrophils. In general, correction causes CD15, CD45 and
HLA-ABC to be even more homogeneously expressed, while it improves the
segregation of other cell types like tumor cells.
Seurat and fastMNN were tested but required too much computing time (several days), and were stopped in the end.
Use plan(“multicore”) to parallelize Seurat and adjust
future.globals.maxSize if required, as explained here.
Choosing the correct integration approach is challenging without having ground truth cell labels available. It is recommended to compare different techniques and different parameter settings. Please refer to the documentation of the individual tools to become familiar with the possible parameter choices. Furthermore, in the following section, we will discuss clustering and classification approaches in light of expression differences between samples.
In general, it appears that MNN-based approaches are less
conservative in terms of merging compared to harmony. On
the other hand, harmony could well merge cells in a way
that regresses out biological signals.
The modified SpatialExperiment object is saved for
further downstream analysis.
saveRDS(spe, file.path(path, "data/spe.rds"))
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 viridis_0.6.2
## [3] viridisLite_0.4.0 dittoSeq_1.8.1
## [5] cowplot_1.1.1 scater_1.24.0
## [7] ggplot2_3.3.6 scuttle_1.6.2
## [9] imcRtools_1.3.5 SpatialExperiment_1.6.1
## [11] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
## [13] Biobase_2.56.0 GenomicRanges_1.48.0
## [15] GenomeInfoDb_1.32.3 IRanges_2.30.0
## [17] S4Vectors_0.34.0 BiocGenerics_0.42.0
## [19] MatrixGenerics_1.8.1 matrixStats_0.62.0
## [21] BiocParallel_1.30.3
##
## loaded via a namespace (and not attached):
## [1] systemfonts_1.0.4 plyr_1.8.7
## [3] igraph_1.3.4 sp_1.5-0
## [5] shinydashboard_0.7.2 digest_0.6.29
## [7] htmltools_0.5.3 magick_2.7.3
## [9] tiff_0.1-11 fansi_1.0.3
## [11] magrittr_2.0.3 ScaledMatrix_1.4.0
## [13] tzdb_0.3.0 limma_3.52.2
## [15] readr_2.1.2 graphlayouts_0.8.0
## [17] svgPanZoom_0.3.4 vroom_1.5.7
## [19] R.utils_2.12.0 svglite_2.1.0
## [21] jpeg_0.1-9 colorspace_2.0-3
## [23] ggrepel_0.9.1 xfun_0.32
## [25] dplyr_1.0.9 crayon_1.5.1
## [27] RCurl_1.98-1.8 jsonlite_1.8.0
## [29] glue_1.6.2 polyclip_1.10-0
## [31] gtable_0.3.0 nnls_1.4
## [33] zlibbioc_1.42.0 XVector_0.36.0
## [35] DelayedArray_0.22.0 BiocSingular_1.12.0
## [37] DropletUtils_1.16.0 Rhdf5lib_1.18.2
## [39] HDF5Array_1.24.2 abind_1.4-5
## [41] scales_1.2.0 pheatmap_1.0.12
## [43] DBI_1.1.3 edgeR_3.38.4
## [45] Rcpp_1.0.9 xtable_1.8-4
## [47] units_0.8-0 dqrng_0.3.0
## [49] rsvd_1.0.5 bit_4.0.4
## [51] proxy_0.4-27 DT_0.24
## [53] htmlwidgets_1.5.4 ellipsis_0.3.2
## [55] pkgconfig_2.0.3 R.methodsS3_1.8.2
## [57] farver_2.1.1 sass_0.4.2
## [59] locfit_1.5-9.6 utf8_1.2.2
## [61] labeling_0.4.2 tidyselect_1.1.2
## [63] rlang_1.0.4 RTriangle_1.6-0.10
## [65] later_1.3.0 munsell_0.5.0
## [67] tools_4.2.0 cachem_1.0.6
## [69] cli_3.3.0 generics_0.1.3
## [71] ggridges_0.5.3 evaluate_0.16
## [73] stringr_1.4.0 fastmap_1.1.0
## [75] fftwtools_0.9-11 yaml_2.3.5
## [77] bit64_4.0.5 knitr_1.39
## [79] tidygraph_1.2.1 purrr_0.3.4
## [81] ggraph_2.0.6 sparseMatrixStats_1.8.0
## [83] mime_0.12 R.oo_1.25.0
## [85] concaveman_1.1.0 compiler_4.2.0
## [87] rstudioapi_0.13 beeswarm_0.4.0
## [89] png_0.1-7 e1071_1.7-11
## [91] tibble_3.1.8 tweenr_1.0.2
## [93] bslib_0.4.0 stringi_1.7.8
## [95] highr_0.9 cytomapper_1.9.1
## [97] lattice_0.20-45 Matrix_1.4-1
## [99] classInt_0.4-7 vctrs_0.4.1
## [101] pillar_1.8.0 lifecycle_1.0.1
## [103] rhdf5filters_1.8.0 jquerylib_0.1.4
## [105] BiocNeighbors_1.14.0 irlba_2.3.5
## [107] data.table_1.14.2 bitops_1.0-7
## [109] raster_3.5-21 httpuv_1.6.5
## [111] R6_2.5.1 promises_1.2.0.1
## [113] KernSmooth_2.23-20 gridExtra_2.3
## [115] vipor_0.4.5 codetools_0.2-18
## [117] MASS_7.3-56 assertthat_0.2.1
## [119] rhdf5_2.40.0 rjson_0.2.21
## [121] withr_2.5.0 GenomeInfoDbData_1.2.8
## [123] parallel_4.2.0 hms_1.1.1
## [125] EBImage_4.38.0 terra_1.6-7
## [127] grid_4.2.0 beachmat_2.12.0
## [129] class_7.3-20 tidyr_1.2.0
## [131] rmarkdown_2.14 DelayedMatrixStats_1.18.0
## [133] sf_1.0-8 ggforce_0.3.3
## [135] shiny_1.7.2 ggbeeswarm_0.6.0